Skip to content

Friendlier workflow errors (consolidated)#1849

Merged
pranaygp merged 34 commits intomainfrom
pranaygp/friendlier-errors-followups
May 4, 2026
Merged

Friendlier workflow errors (consolidated)#1849
pranaygp merged 34 commits intomainfrom
pranaygp/friendlier-errors-followups

Conversation

@pranaygp
Copy link
Copy Markdown
Contributor

@pranaygp pranaygp commented Apr 24, 2026

Summary

Consolidates the eight-PR friendlier-errors stack into a single PR. Inspired by @Schniz's stalled #706. Superseded by this PR: #1831, #1832, #1836, #1837, #1838, #1839, #1840.

What's included

Area Change
@workflow/errors New SerializationError, WorkflowBuildError classes (with optional hint field). Ansi rendering helpers (frame, code, docs, dim, inline) — now lives under the @workflow/errors/ansi subpath so the main entry doesn't pull chalk into every consumer. FatalError.is(err) widened to recognize any error with a fatal: true own property.
@workflow/core — context violations Four structured error classes (NotInWorkflowContextError, NotInStepContextError, NotInWorkflowOrStepContextError, UnavailableInWorkflowContextError) applied to twelve user-facing throw sites. Each includes a docs link. .message / .stack are plain text — the colored framed form renders lazily via [util.inspect.custom] / toString(), so structured logs and log drains no longer contain raw \x1B[...m bytes. All four classes set fatal = true, so createHook()-from-a-step fails immediately instead of burning three retry attempts. Thrown errors redirect their stack to the user's call site via a shared redirectStackToCaller helper so terminal overlays (Next.js, Turbopack, VS Code) point at user code.
@workflow/core — serialization SerializationError applied to all user-facing serialization boundaries: stream locking, unregistered classes, missing WORKFLOW_DESERIALIZE, step-function / workflow-function misuse, and dehydrate/hydrate failures for workflow args, step args, and return values.
@workflow/core — runtime logger Structured logger gains .child() and .forRun(runId, workflowName) for stable per-run context, standardized [workflow-sdk] prefix, error stacks surfaced in log drains, clarified replay-timeout phrasing (warn while retrying vs. error when giving up).
@workflow/core — attribution describeError(err) and describeRunError({ errorCode, errorName }) compute user-vs-SDK attribution + class-aware hints, either from a live Error instance or from persisted failure-event fields. Exposed under the public @workflow/core/describe-error subpath for CLI / web consumption. Terminal logs at step-failure, max-retries, run-failure, and fatal-setup sites now include errorAttribution metadata and hint text.
@workflow/core — consistency Remaining internal invariants (missing startedAt, VM crypto.subtle.generateKey, closure-vars outside a step context, ENOTSUP) now throw WorkflowRuntimeError so they are attributed to the SDK. defineHook().resume() formats schema validation failures as a readable bulleted list instead of a raw JSON dump.
@workflow/builders — build-time Failed esbuild phases, unresolved built-in steps, and empty esbuild output now throw WorkflowBuildError with a hint pointing at the likely fix.

Fixes from manual testing (createHook() inside a step)

Surfaced three issues after running the earlier stack end-to-end and addressed in the final commits:

  1. ANSI bytes were leaking into .message because chalk resolves at construction time — fixed by storing plain text and rendering pretty lazily (see context-errors-plain-message changeset).
  2. Context violations were retrying 3× and producing 4 duplicated log blocks — fixed by marking them fatal: true and widening FatalError.is() (see context-errors-fatal changeset).
  3. Duplicated captureStackTrace feature-detect in two files — fixed by extracting a shared redirectStackToCaller helper (see capture-stack-shared changeset).

Follow-up round (latest commits)

Manual testing uncovered two more issues, fixed in the last two commits on this branch:

  1. SerializationError still looped through 4 retries. Root cause: dehydrateStepReturnValue() was called outside the step-handler try/catch, so FatalError.is() never saw the error. Two-part fix: mark SerializationError as fatal = true, and move the dehydration call inside the user-code try/catch in step-handler.ts so the error routes through userCodeFailedstep_failed (see serialization-error-fatal changeset). Same non-POJO return now fails in ~1.6s / 1 block, not ~21s / 4 blocks.
  2. Snapshot tests for log output. Added toMatchInlineSnapshot-backed tests for describeError() payloads (every attribution path) and the scoped-logger call signature for the two canonical runtime failure sites. Regression gate on the exact field shapes users see in their log drains.

Post-review polish (latest commits)

Driven by manual testing in tmux + reviewer feedback after the initial review pass:

  1. Logger layout overhaul. Composed framing + structured fields (user error · FatalError, run wrun_…, step step_… · add (./workflows/x)) between the framing line and the stack body, instead of after — the most useful info now sits at the top of each block instead of being buried under 30+ stack lines.
  2. Stack trim. composeLogLine() drops framework-internal frames (node_modules/.pnpm/, node:internal/, Turbopack-bundled node_modules__pnpm_* and _next_dist_* chunks) and caps the surviving frames at 6. Suppressed runs emit one summary line so users know the stack was trimmed.
  3. Friendly step / workflow names in framings. step//./workflows/foo//bar renders as bar (./workflows/foo) everywhere we log a step or workflow name, via new formatStepName / formatWorkflowName helpers in @workflow/utils.
  4. Consistent ╰▶ hint: / ╰▶ docs: framing across all errors with hints or docs slugs. WorkflowError, SerializationError, and WorkflowBuildError now share one appendFramedDetails helper, matching the box-drawing tree ContextViolationError already used. The blank-line-separated Learn more: <url> style is gone.
  5. Single hint, lives on the error message. Dropped the duplicate hint field from runtime logger metadata. Hints get serialized into the event log on the message itself, rehydrate on the workflow side, and surface in observability. The previous logger-only hint duplicated stderr but never made it past the step boundary.
  6. Better serialization-failure hint. Updated to point at the foundations doc instead of a hardcoded (plain objects, arrays, primitives, …) list, which drifted out of sync as the supported types grew. Reuses across step args, workflow args/return, stream messages, and any other site that goes through formatSerializationError.
  7. Workbench route no longer double-logs failures. app/api/workflows/start/route.ts's SSE wrapper was catching WorkflowRunFailedError rejection and re-emitting it via console.error('Error in workflow stream:', error) + controller.error(error), which then triggered Next.js's ⨯ failed to pipe response overlay. The SDK already logs the failure cleanly, so the wrapper now closes the stream gracefully on WorkflowRunFailedError.
  8. ErrorStackBlock title trim in web observability. Multi-line error messages (Failed to serialize step return value\n╰▶ hint: …) were rendering the entire framed body in the card title, pushing the copy button off-screen. Title now shows just the first non-empty trimmed line with single-line truncation; full message stays in the body via Error.stack.
  9. Persisted error message no longer embeds the machine step name. Step "step//./.../foo" failed after N retries: … and Step "step//.../foo" exceeded max retries (…) now read Step failed after N retries: … / Step exceeded max retries (…). Observability already attributes the event to a specific step via the UI tree.
  10. Retry summary reads 4 attempts · 3 max retries instead of 3 retries (which was ambiguous next to "4 attempts").
  11. WorkflowError no longer leaks cause: undefined as an enumerable own property when no cause is provided — every subclass's util.inspect(err) output stays clean.
  12. ContextViolationError [util.inspect.custom] counted message lines correctly when slicing the stack tail, fixing duplicated ╰▶ docs: lines on multi-detail errors.
  13. Step-handler integration tests for fatal vs non-fatal retry-loop wiring: a fatal: true error emits one step_failed and zero step_retrying; a non-fatal Error retries via step_retrying until the budget is exhausted, then emits step_failed. Catches the silent-regression case where fatal = true is removed from a class but FatalError.is() unit tests stay green.

Addressed review feedback

Manual test plan

All sections below exercise different parts of the stack. Start cd workbench/nextjs-turbopack && pnpm dev unless otherwise noted.

1. Context-violation errors (phase 1 + 2 + followups)

A convenient smoke route at workbench/nextjs-turbopack/app/api/friendlier-errors-smoke/route.ts:

import { NextResponse } from 'next/server';
import { createHook, sleep, getStepMetadata, getWorkflowMetadata } from 'workflow';

export async function GET(req: Request) {
  const which = new URL(req.url).searchParams.get('which');
  try {
    if (which === 'createHook') createHook();
    if (which === 'sleep') await sleep('1s');
    if (which === 'getStepMetadata') getStepMetadata();
    if (which === 'getWorkflowMetadata') getWorkflowMetadata();
    return NextResponse.json({ ok: true });
  } catch (err) {
    console.error(err);
    return NextResponse.json(
      { name: (err as Error).name, message: (err as Error).message, stack: (err as Error).stack },
      { status: 500 }
    );
  }
}
  • createHook() outside workflow?which=createHook. Terminal shows a framed box with title `createHook()` can only be called inside a workflow function and a docs: https://…/workflow/create-hook line (polished — no longer note: Read more about…).
  • sleep() outside workflow?which=sleep. Same framing, docs URL ends in .../workflow/sleep.
  • getStepMetadata() in a workflow function (not a step) — add to a "use workflow" file; expect title "can only be called inside a step function".
  • getWorkflowMetadata() in application code?which=getWorkflowMetadata. Expect "workflow or step function".
  • resumeHook() inside a workflow — call resumeHook(token, payload) from inside a "use workflow" function. Title: `resumeHook()` cannot be called from a workflow context, plus a line this call was made from the workflow//./src/workflows/example.ts//myWorkflow workflow context. with the workflow/ prefix dimmed.
  • Stack points at user code — in the JSON response body, the first at ... line of err.stack should reference your route handler, not a frame inside @workflow/core. Same goes for the Next.js dev overlay.
  • No functionName leak — the JSON response body should NOT contain a functionName property on the error object (it used to via the old constructor param-property).
  • ANSI bytes don't leak into .message / .stack (new in this PR) — the JSON response body's message and stack strings must contain no \x1B[ bytes; they're plain text. In the terminal, console.error(err) still renders the pretty framed version via util.inspect.
  • Context violations fail fast (no 4× retries) (new in this PR) — call createHook() inside a "use step" function:
    async function add(a: number, b: number): Promise<number> {
      'use step';
      createHook();
      return a + b;
    }
    Run a workflow that calls add(…). You should see one [workflow-sdk] fatal-error log block (Step "X" threw a FatalError — bubbling up...), not four. The step fails immediately.

2. Runtime logger metadata (phase 3)

  • Standardized prefix — all runtime log lines start with [workflow-sdk]. Grep your terminal output; there should be no [Workflows] or other prefixes.
  • Run context attached without repetition — trigger any workflow; log lines carry workflowRunId and workflowName as metadata fields instead of being baked into the message string.
  • Replay-timeout phrasing:
    • While retries remain: warn level, phrasing includes "took too long — will retry".
    • After final attempt: error level, phrasing includes "gave up".
  • Error stack surfaces in log drain — set up a log drain (or pipe to a file) and confirm the stack is included as a structured field (errorStack), not just the one-line error message.

3. SerializationError (phase 4)

Cause a user-facing serialization failure:

async function stepThatLeaks(): Promise<Map<string, { fn: () => void }>> {
  'use step';
  return new Map([['foo', { fn: () => console.log('unserializable') }]]);
}
  • Run the workflow. Expect [workflow-sdk] log with errorName: 'SerializationError'.
  • errorAttribution: 'user'.
  • Hint is rendered as a ╰▶ hint: line on the error message itself — the URL points at https://workflow-sdk.dev/docs/foundations/serialization (foundations doc, not a hardcoded type list). The hint travels with the persisted event into observability.
  • Stream locking: call getWritable('x').getWriter() twice on the same stream — expect a SerializationError with the same ╰▶ hint: framing.

4. describeError attribution (phase 5)

For each of these, inspect the [workflow-sdk] log at failure time:

  • Plain user Errorthrow new Error('boom') from a step → errorAttribution: 'user', no framed hint on the message.
  • SerializationErrorerrorAttribution: 'user' + ╰▶ hint: line on the message pointing at the foundations doc.
  • Context-violation errorerrorAttribution: 'user' + ╰▶ docs: line on the message pointing at the per-API reference page.
  • WorkflowRuntimeErrorthrow new WorkflowRuntimeError('invariant') from a step → errorAttribution: 'sdk'.
  • Replay timeout — set WORKFLOW_REPLAY_TIMEOUT_MS=50, run a non-trivial workflow → after retries exhaust, errorAttribution: 'sdk', errorCode: 'REPLAY_TIMEOUT'.
  • Max-delivery exhaustion — write a step that always throws; after queue max-delivery budget exhausts → errorAttribution: 'sdk', errorCode: 'MAX_DELIVERIES_EXCEEDED'.

Note: describeError(err) and describeRunError({errorCode, errorName}) still return a hint field for CLI / web consumers, but the runtime logger no longer adds it as a duplicate metadata field. Hints live on the error message instead, so they survive serialization and surface in observability.

5. Consistency pass (phase 6)

  • Zod schema failure on defineHook().resume() — call hook.resume(token, invalidBody). Expect a readable bulleted list of validation issues (one per line, at "field": message), not a raw JSON dump of ZodError.issues.
  • crypto.subtle.generateKey() inside workflow VM — call it from a "use workflow" function. Expect a clear message explaining why it's disabled + "move this into a step function", with errorAttribution: 'sdk'.

6. describeError subpath (phase 7 foundation)

Create scratch.ts at repo root:

import { describeRunError, describeError } from '@workflow/core/describe-error';
import { SerializationError, WorkflowRuntimeError } from '@workflow/errors';

console.log(describeRunError({ errorCode: 'USER_ERROR', errorName: 'SerializationError' }));
console.log(describeRunError({ errorCode: 'USER_ERROR', errorName: 'NotInWorkflowContextError' }));
console.log(describeRunError({ errorCode: 'RUNTIME_ERROR' }));
console.log(describeRunError({ errorCode: 'REPLAY_TIMEOUT' }));
console.log(describeRunError({ errorCode: 'MAX_DELIVERIES_EXCEEDED' }));
console.log(describeRunError({ errorCode: 'USER_ERROR' }));

console.log(describeError(new SerializationError('boom')));
console.log(describeError(new WorkflowRuntimeError('invariant')));
console.log(describeError(new Error('plain')));

Run pnpm tsx scratch.ts.

  • describeRunError({ errorCode: 'USER_ERROR', errorName: 'SerializationError' }){ attribution: 'user', errorCode: 'USER_ERROR', hint: 'A value…serialized…' }.
  • describeRunError({ errorCode: 'RUNTIME_ERROR' }){ attribution: 'sdk', hint: 'This is an internal workflow SDK error…' }.
  • Live-error parity — describeError(new SerializationError('x')) and describeRunError({ errorCode: 'USER_ERROR', errorName: 'SerializationError' }) return the same shape and hint string.
  • Subpath import works — TypeScript resolves @workflow/core/describe-error and pnpm tsx runs without module-resolution errors.

7. WorkflowBuildError (phase 8)

Exercise the build pipeline, not the runtime. Use workbench/nextjs-turbopack and run pnpm build (not pnpm dev).

  • Syntax error in a workflow file → in the build output, expect a WorkflowBuildError titled "Build failed during workflows bundle" followed by a blank line and hint: Review the esbuild errors above…. The original esbuild errors remain printed above (not suppressed).
  • Unresolved built-in stepsmv node_modules/workflow node_modules/workflow-bak and run pnpm build. Expect WorkflowBuildError: Failed to resolve built-in steps sources. + hint: run \pnpm install workflow`…`. Restore afterwards.
  • Empty workflow directory — move all workflow files aside. Expect WorkflowBuildError: No output files generated from esbuild + hint mentioning "use workflow" / "use step" directives.
  • .is() discriminator — in a scratch script: WorkflowBuildError.is(new WorkflowBuildError('x', { hint: 'y' })) returns true; WorkflowBuildError.is(new Error('x')) returns false.
  • Runtime paths unaffected — run a normal workflow at runtime (pnpm dev). Confirm no WorkflowBuildError shows up; this class is build-time only.

8. New @workflow/errors/ansi subpath (final PR)

  • import { Ansi } from '@workflow/errors' no longer works (and wasn't intended to — the helpers were always namespaced). Confirm import * as Ansi from '@workflow/errors/ansi' resolves.
  • A package that imports only error classes (import { SerializationError } from '@workflow/errors') no longer pulls chalk into its bundle. Check a production bundle or pnpm why chalk from a dependent context.

9. FatalError.is() widening (final PR)

  • FatalError.is(new FatalError('x'))true.
  • FatalError.is(new NotInWorkflowContextError('createHook()', 'https://…'))true (via fatal: true own property).
  • FatalError.is(new Error('x'))false.
  • FatalError.is({ fatal: true })false (must be an Error-shaped value).

Unit tests

All packages typecheck clean; relevant test files pass:

  • pnpm --filter @workflow/errors test — 26 tests (Ansi, SerializationError, WorkflowBuildError, FatalError widening, framed-details consolidation)
  • pnpm --filter @workflow/core exec vitest run src/log-format.test.ts src/logger.test.ts src/context-errors.test.ts src/describe-error.test.ts src/runtime/step-handler.test.ts — 76 tests, including new composeLogLine snapshot tests and the fatal-vs-retryable step-handler integration suite
  • pnpm --filter @workflow/builders test — 129 tests
  • pnpm typecheck — clean across workspace

Smoke-test harness (local)

For reviewers reproducing the manual test plan, a one-shot tmux harness lives at /tmp/wf-1849-smoke.sh (built during testing). Pane layout:

┌─ dev server :3010 ──────────┬─ workflow web :3011 ──┐
│  pnpm dev (next.js + sdk)   │  pnpm workflow web    │
├──────────────────────────────┴───────────────────────┤
│  /tmp/wf-1849-smoke.sh ok | fatal | retryable |     │
│                       serialization | context-step  │
│                       ctx-create | ctx-sleep | ...  │
│                       all                            │
└──────────────────────────────────────────────────────┘

The harness's workbench/nextjs-turbopack/workflows/smoke.ts and app/api/friendlier-errors-smoke/route.ts are kept out of git (workbench-local) — set them up via the workflow-init-style instructions in the prior conversation if you want to reuse the harness.

🤖 Generated with Claude Code

pranaygp and others added 13 commits April 22, 2026 18:28
Phase 1: Add Ansi rendering helpers (frame, hint, note, help, code, inline)
to @workflow/errors, and a chalk mock for readable snapshot tests.

Phase 2: Add four context-violation error classes to @workflow/core
(NotInWorkflowContextError, NotInStepContextError,
NotInWorkflowOrStepContextError, UnavailableInWorkflowContextError)
and apply them to all twelve user-facing throw sites so errors now
include docs links and a structured "what/why/fix" frame.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Tighten phase 1 changeset to a single sentence (per pranaygp review) and switch to double-quoted frontmatter (per Copilot + repo convention).
- Implement `ansifyName` to actually apply dim styling to workflow/ / step/ prefixes; add an `Ansi.dim` helper to `@workflow/errors` so callers don't need to import chalk directly.
- Remove the `void getWorkflowMetadata;` workaround in context-errors.ts by dropping the unused value import (we only needed the type and symbol).
- Render the plain-Error throw in `workflow/get-workflow-metadata.ts` with `Ansi.frame` + docs link so the VM path matches the structured-class styling from the sibling step path (still uses a plain Error to avoid the module-init cycle).
- Guard `buildUnderline` against zero-length markers so a stray empty token can't produce a negative `String.repeat` count.
Adds a `.child()` and `.forRun(runId, workflowName)` child-logger API to
the structured logger so runtime/step code doesn't have to repeat
`workflowRunId`/`workflowName`/`stepId` on every call. Normalizes error
metadata to structured `errorName` / `errorMessage` / `errorStack` fields
instead of ad-hoc `error: err.message` strings, and adds comments to
silent catches that swallow expected idempotency conflicts.

Also folds in the pending changes from #1812 so that PR can be closed:

- Standardize the console prefix to `[workflow-sdk]`.
- Split the replay-timeout log into a warn-while-retrying vs.
  error-when-giving-up, and surface the underlying error when we can't
  mark a timed-out run as failed.
- Include the error stack in the "Fatal runtime error during workflow
  setup" log and in the top-level user-code workflow error log so the
  stack surfaces in flattened log drains.
- Drop the `[Workflows] "<runId>" - ` prefix from
  `buildWorkflowSuspensionMessage` — the structured logger now attaches
  run context.

Supersedes #1812.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 4 of friendlier errors: introduce a `SerializationError` class with
an optional `hint` and a docs link (workflow-sdk.dev/err/serialization-failed),
and adopt it at every user-facing serialization boundary in @workflow/core:

- Locked ReadableStream at a workflow boundary
- Unregistered class / missing `classId` / missing `WORKFLOW_DESERIALIZE`
- Attempting to return step functions to clients or call workflow functions
  directly
- Webhook `respondWith()` called outside a step
- `dehydrate*` / `getSerializeStream` failures (workflow args/return, step
  args/return, stream chunks)

Internal invariants (format prefix length checks, unknown format bytes,
missing `STREAM_NAME_SYMBOL`, encryption key/size guards, etc.) now throw
`WorkflowRuntimeError` instead of plain `Error` so the classifier and logger
treat them consistently.

`formatSerializationError` now returns `{ message, hint }` so the hint
fragment can be rendered with the standard SerializationError framing
instead of being baked into the message string.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add describeError() that derives attribution and class-aware hints from
existing error classes + RUN_ERROR_CODES — no event data changes. Wire into
step failures, max-delivery exhaustion, run failures, and fatal setup errors
so terminal logs include errorAttribution and a hint for known error types.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- `describeError(err, errorCode?)` now accepts an optional precomputed
  `RunErrorCode`. `classifyRunError(err)` only narrows to USER_ERROR /
  RUNTIME_ERROR, so the REPLAY_TIMEOUT and MAX_DELIVERIES_EXCEEDED branches
  were previously unreachable from the step / run failure log sites.
  Callers that know the failure category (runtime.ts for replay timeout and
  max-deliveries exhaustion) now pass the code in.
- Context-violation checks use `instanceof` against the actual classes from
  context-errors.ts instead of a name-string set. Type-safe + survives
  class renames.
- Wire the new hints through to the REPLAY_TIMEOUT and MAX_DELIVERIES_EXCEEDED
  log sites so those branches actually render a hint now.
- 3 new tests cover the reachable code paths + precomputed-code override.
- Changeset frontmatter switched to double quotes per repo convention.
Internal invariants now use WorkflowRuntimeError so describeError attributes
them to the SDK: missing startedAt, VM generateKey, closure-vars outside
step context, ENOTSUP. defineHook().resume() formats schema validation
failures as a readable list instead of a JSON blob.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Observability renderers read persisted run_failed / step_failed event data,
not live Error instances. describeRunError takes { errorCode, errorName }
and returns the same { attribution, hint } shape as describeError, so the
CLI and web UI can derive user-vs-SDK framing from the event log directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add `WorkflowBuildError` class in `@workflow/errors` with optional `hint`
for an actionable next step, and apply it in `@workflow/builders` at
user-facing sites: failed esbuild phases, unresolved built-in steps, and
empty esbuild output now throw `WorkflowBuildError` with a hint pointing
at the likely fix. Runtime invariants remain plain `Error`.
…docs link, redirect stack

- Drop the readonly `functionName` param-property on context-error classes so
  util.inspect no longer prints a trailing `{ functionName: 'foo()' }` block.
- Replace the `DocLink` ("label: https://…") shape with a plain `DocsUrl`
  template-literal type. Error output now renders a single clean line:
  `docs: https://…` (new `Ansi.docs` helper) instead of the noisier
  "note: Read more about foo(): https://…".
- Add throw helpers (`throwNotInWorkflowContext`, etc.) that call
  `Error.captureStackTrace(err, stackStartFn)` on V8 engines so the top frame
  of the thrown error points at the user's call site instead of at the gate
  function inside the framework. Callers pass themselves as the boundary.
- Refactor `defineHook()` (both root and `/workflow`) to use named function
  closures rather than `this.create`/`this.resume`, since the stack redirect
  relies on a stable function identity that survives destructuring.
- Update context-errors.test.ts to snapshot the new `docs:` framing and to
  add a regression test asserting the top stack frame is the user call site.
Copilot AI review requested due to automatic review settings April 24, 2026 02:08
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 24, 2026

🦋 Changeset detected

Latest commit: 5b8d791

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 23 packages
Name Type
@workflow/core Patch
@workflow/errors Patch
@workflow/builders Patch
@workflow/utils Patch
@workflow/cli Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/vitest Patch
@workflow/web-shared Patch
@workflow/web Patch
workflow Patch
@workflow/world-testing Patch
tarballs Patch
@workflow/world-local Patch
@workflow/world-postgres Patch
@workflow/world-vercel Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/nuxt Patch
@workflow/ai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Ready Ready Preview, Comment May 4, 2026 4:20am
example-nextjs-workflow-webpack Ready Ready Preview, Comment May 4, 2026 4:20am
example-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-astro-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-express-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-fastify-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-hono-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-nitro-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-nuxt-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-sveltekit-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-tanstack-start-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workbench-vite-workflow Ready Ready Preview, Comment May 4, 2026 4:20am
workflow-docs Ready Ready Preview, Comment, Open in v0 May 4, 2026 4:20am
workflow-swc-playground Ready Ready Preview, Comment May 4, 2026 4:20am
workflow-tarballs Ready Ready Preview, Comment May 4, 2026 4:20am
workflow-web Ready Ready Preview, Comment May 4, 2026 4:20am

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

🧪 E2E Test Results

All tests passed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 1011 0 67 1078
✅ 💻 Local Development 1090 0 86 1176
✅ 📦 Local Production 1090 0 86 1176
✅ 🐘 Local Postgres 1090 0 86 1176
✅ 🪟 Windows 98 0 0 98
✅ 📋 Other 552 0 36 588
Total 4931 0 361 5292

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 91 0 7
✅ example 91 0 7
✅ express 91 0 7
✅ fastify 91 0 7
✅ hono 91 0 7
✅ nextjs-turbopack 96 0 2
✅ nextjs-webpack 96 0 2
✅ nitro 91 0 7
✅ nuxt 91 0 7
✅ sveltekit 91 0 7
✅ vite 91 0 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 92 0 6
✅ express-stable 92 0 6
✅ fastify-stable 92 0 6
✅ hono-stable 92 0 6
✅ nextjs-turbopack-canary 79 0 19
✅ nextjs-turbopack-stable 98 0 0
✅ nextjs-webpack-canary 79 0 19
✅ nextjs-webpack-stable 98 0 0
✅ nitro-stable 92 0 6
✅ nuxt-stable 92 0 6
✅ sveltekit-stable 92 0 6
✅ vite-stable 92 0 6
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 92 0 6
✅ express-stable 92 0 6
✅ fastify-stable 92 0 6
✅ hono-stable 92 0 6
✅ nextjs-turbopack-canary 79 0 19
✅ nextjs-turbopack-stable 98 0 0
✅ nextjs-webpack-canary 79 0 19
✅ nextjs-webpack-stable 98 0 0
✅ nitro-stable 92 0 6
✅ nuxt-stable 92 0 6
✅ sveltekit-stable 92 0 6
✅ vite-stable 92 0 6
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 92 0 6
✅ express-stable 92 0 6
✅ fastify-stable 92 0 6
✅ hono-stable 92 0 6
✅ nextjs-turbopack-canary 79 0 19
✅ nextjs-turbopack-stable 98 0 0
✅ nextjs-webpack-canary 79 0 19
✅ nextjs-webpack-stable 98 0 0
✅ nitro-stable 92 0 6
✅ nuxt-stable 92 0 6
✅ sveltekit-stable 92 0 6
✅ vite-stable 92 0 6
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 98 0 0
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 92 0 6
✅ e2e-local-dev-tanstack-start-stable 92 0 6
✅ e2e-local-postgres-nest-stable 92 0 6
✅ e2e-local-postgres-tanstack-start-stable 92 0 6
✅ e2e-local-prod-nest-stable 92 0 6
✅ e2e-local-prod-tanstack-start-stable 92 0 6

📋 View full workflow run

pranaygp added 2 commits May 1, 2026 10:06
Replace util.inspect's default object dump (which quote-escapes
multi-line stacks and paragraph hints into a single-line JSON-y blob)
with a workflow-aware formatter that composes the entire log line
into a single string passed to console.error / console.warn.

Highlights of the new output:
- Per-run / per-step IDs render with their parsed friendly names so
  users see `wrun_… · simple (./workflows/1_simple)` instead of just
  the raw `workflowName: 'workflow//./workflows/1_simple//simple'`.
- Color-coded attribution badge (user error red / sdk error magenta)
  paired with the error class in bold.
- Hints render as a paragraph under `hint:` rather than a backslash-
  `\n`-escaped string.
- Drops redundant fields (errorStack always; errorMessage when it's
  already in the parent message) to avoid double-printing.
- Unknown fields fall through as a sorted `key  value` tail so we
  never silently drop log information.

@workflow/errors/ansi gains bold/red/magenta helpers used by the
formatter. The web / web-shared packages don't consume stderr — they
read structured event payloads from the World event log — so this is
presentation-only at the runtime layer.
Resolve conflicts:
- packages/builders/src/base-builder.ts — keep WorkflowBuildError import
  alongside main's removal of unused usesVercelWorld.
- packages/core/src/serialization.ts — main extracted serialization into
  modular ./serialization/{client,step,workflow}.ts and reformatted the
  error wrapper. Re-apply the SerializationError adoption on top of the
  new structure:
  - ./serialization/errors.ts now returns { message, hint } so callers
    can pass the hint through SerializationError instead of baking it
    into the message.
  - The 3 modular serializers (client/step/workflow) and 4 dehydrate/
    hydrate wrappers in serialization.ts now throw SerializationError
    with a hint, unwrapping any inner SerializationError cause to keep
    the more specific outer context label ("workflow arguments" vs
    generic "workflow value").
  - User-facing throws (ReadableStream is locked / step function not
    found / workflow functions cannot be called directly / respondWith
    outside step / step functions cannot be deserialized in client
    context) are now SerializationError with hints.
  - Internal invariants (stream-name validation, missing STREAM_NAME_
    SYMBOL, closure-vars outside step context) are now WorkflowRuntime-
    Error so they get classified consistently.
Copy link
Copy Markdown
Member

@TooTallNate TooTallNate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the consolidated stack carefully — context-violation classes, capture-stack helper, describe-error, the errors-package additions and ANSI subpath, the logger rewrite, and the step-handler dehydrate move. The overall direction is good and a lot of it is well-executed; the structured FramedContent / renderPlain / renderPretty split is the right architecture for "plain message + pretty terminal rendering" and the subpath split for chalk is correct. Finding most of the right tradeoffs.

I have one blocking item, plus a handful of follow-up concerns and observations to raise before merge.

Blocking

pr-artifacts/ directory needs to be removed before merge

Six files still in pr-artifacts/ (01-... through 05-... plus README). PR description explicitly says "Remove before merge" — they shouldn't ship to git history. Easy git rm pr-artifacts/, but it's the kind of thing that gets forgotten in the merge rush.

Higher-risk concerns

FatalError.is() widening is a runtime-behavior change

The static is() predicate now returns true for any Error carrying a truthy fatal property — not just name === 'FatalError'. This is the documented intent and enables SerializationError, ContextViolationError, etc. to opt in without inheritance. But it changes retry semantics — anything FatalError.is() returns true for skips retry budget.

Risk: a user-defined error class that incidentally has fatal = true (someone's class CustomError { fatal = true }) will silently become non-retryable in workflow runs. For most users this is fine; for anyone who rolled their own "fatal" convention, this is a surprise.

Two requests:

  1. The friendlier-errors-consistency changeset should be minor, not patch, for @workflow/errors. Adding new error classes is minor; widening a public is() predicate is at minimum minor. Verify.
  2. Add a test that asserts a non-Error POJO with { fatal: true } does NOT pass FatalError.is() (the isError guard handles this, but lock it in).

Step-handler dehydrate move is correct but undertested

The "move dehydrate inside the user-code try/catch" change is the highest-leverage runtime-behavior change in this PR (along with the FatalError.is widening above). I traced it carefully and the new flow is correct: dehydrate failures now route through userCodeError → fatal branch → step_failed → no retry. The if (!userCodeFailed) guard is right; the new try/catch funnels into the same handler block. Confirmed end-to-end:

  • dehydrateStepReturnValue only ever throws SerializationError (own wrapping catch in serialization.ts)
  • SerializationError has readonly fatal = true
  • FatalError.is(SerializationError) is true via the new widening
  • Routes to fatal branch → step_failed event → no step_retrying

But step-handler.test.ts doesn't test this path at all. The test diff only updates mock shapes for the new forRun/child logger methods. The dehydrate mock at lines 118-127 is mockResolvedValue(...) — it never rejects. Add a test where it rejects with SerializationError and assert step_failed is emitted exactly once and step_retrying is never emitted. This is the regression-gate test the PR most needs and doesn't have.

There's also one subtle behavior change worth calling out: errors from trace() setup itself (OTEL config, etc.) used to escape to the queue (48 redeliveries). They now get caught by the new inner try/catch and treated as user-step failures (4 attempts, attributed to user). Probably fine — OTEL setup failures are deterministic — but worth a sentence in the changeset.

Other observations (non-blocking)

Minor design

  • hint is baked into .message in WorkflowBuildError and SerializationError (super(\${message}\n\n${options.hint}`)). Future renderers that want to format hints separately can't recover the bare message. Consider keeping super(message)plain and exposinghint` separately; let renderers compose. Not blocking, but locks in a format.
  • Box-drawing implemented twice: renderPlain in context-violation-error.ts re-implements the framing structure of Ansi.frame to avoid pulling chalk into core. Defensible for the dependency reason (and the comment is clear), but the two implementations can drift. Worth a cross-reference comment in both places.
  • Eight new throw helpers (throwNotInWorkflowContext etc.) for four classes. Could be one generic throwContextViolation(Class, fnName, docsUrl, stackStartFn). Current shape is more autocomplete-friendly; defensible either way.

Logger / log-format

  • ANSI bytes can leak into stderr drains. formatLogMetadata uses Ansi.bold / Ansi.red etc. which call chalk directly. Tests pass because vitest has no TTY (chalk level 0 → pass-through). But if anyone runs pnpm dev with FORCE_COLOR=1, or stderr is piped through a wrapper that is a TTY upstream of a structured drain, ANSI bytes land in the message field. The OTel span event path (addEvent({ message, ...merged }) in logger.ts:73) is clean — it uses the unformatted message. But the console path is environment-dependent. Suggest using a chalk.Instance({ level: 0 }) for log formatting, or stripping ANSI before stderr write. Not blocking, but worth knowing.
  • stepLogger.* callsites bypass forRun-based scoping. step-handler.ts has 6 callsites (lines 371, 591, 644, 711, 764, 778) that use the namespace-step stepLogger and manually re-pass { workflowRunId, workflowName, stepId, stepName } each time. The runtimeLogger.forRun(...) migration is great where it's done; these bypass it. Could be stepLogger.forRun(...).child({ stepId, stepName }).
  • Lowercase [workflow] prefix still exists in runtime/world.ts:90, next/src/loader.ts:648, next/src/builder-deferred.ts:153/174/1010. These are direct console.* calls (not logger), so technically out of scope for the prefix-standardization changeset, but worth noting if "all SDK output prefixed with [workflow-sdk]" is the goal.

Module-realm hazards

  • describe-error.ts mixes Class.is(err) (duck-typed; survives ESM/CJS dual-package hazards) with err instanceof WorkflowRuntimeError (nominal; doesn't survive). Recommend .is() everywhere for consistency in a multi-realm ecosystem.

Minor

  • inline in errors/src/ansi.ts (~100 lines for the tagged template that builds Rust-compiler-style underlines) is impressively complex. Worth checking it has snapshot tests for: no markers, single marker, multiple markers, marker at end of line, color-on vs color-off, multi-line input. Skimmed the test file; coverage is thinner than the implementation deserves.
  • The "PR description" mentions esbuild.context().rebuild() failures bypass the new WorkflowBuildError path. The most common build-error case (unresolved imports during HMR) won't get the friendly framing. Worth a follow-up issue.

Doc

  • Some doc gaps for users on whether error message strings are stable API. The new describeError function consumes err.name and class identity, which suggests messages aren't API. Worth adding a CONTRIBUTING note.

To summarize what I'd want before merge:

  1. Remove pr-artifacts/.
  2. Add the regression-gate test for dehydrateStepReturnValue rejecting in step-handler.test.ts.
  3. Verify changeset bumps for the FatalError.is widening are correct (minor at least).
  4. Add the isError guard test locking in that { fatal: true } POJOs are not classified as fatal.

Everything else is non-blocking and could be follow-ups.

Copy link
Copy Markdown
Member

@VaguelySerious VaguelySerious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review: blocking issues found

return tail
? `${this.name}: ${pretty}\n${tail}`
: `${this.name}: ${pretty}`;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review: Blocking

[util.inspect.custom] duplicates every detail line in the rendered output. Repro on the built dist:

import { inspect } from 'node:util';
import { NotInWorkflowContextError } from '@workflow/core/...';
console.log(inspect(new NotInWorkflowContextError('createHook()', 'https://example.com/docs')));

produces:

NotInWorkflowContextError: `createHook()` can only be called inside a workflow function
╰▶ docs: https://example.com/docs
╰▶ docs: https://example.com/docs        ← duplicated
    at userCallSite (...)

For UnavailableInWorkflowContextError (3 detail lines, with active workflow context), all three render twice.

Root cause: .message is multi-line (title\n╰▶ docs: …), so V8's .stack is Name: messageLine1\nmessageLine2\nmessageLine3\n at …. split('\n').slice(1).join('\n') only strips line 1, gluing the remaining message lines onto the prepended pretty form.

Suggested fix:

const messageLines = this.message.split('\n').length;
const tail = (this.stack ?? '').split('\n').slice(messageLines).join('\n');

This is the primary terminal display path users see (console.error(err), framework dev overlays, uncaught throws), so worth fixing before merge. The existing 'util.inspect(err) reveals the pretty framed form' test only asserts .toContain('╰▶') and won't catch this — recommend adding a not.toMatch(/╰▶ docs:.*╰▶ docs:/s) or a toMatchInlineSnapshot with masked stack frames.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9d45cdf — counted the actual message line count instead of slicing only line 1. Added a regression test in context-errors.test.ts that asserts ╰▶ docs: appears exactly once and that no detail line repeats.


constructor(message: string, options?: WorkflowBuildErrorOptions) {
const body = options?.hint ? `${message}\n\n${options.hint}` : message;
super(body, { cause: options?.cause });
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review: Note

This super(body, { cause: options?.cause }) call (and the equivalent in SerializationError below) hits WorkflowError's constructor, which unconditionally does this.cause = options?.cause. That makes cause: undefined an enumerable own property on every instance, so util.inspect(err) of a no-cause error renders { cause: undefined, ... }.

Not new to this PR — WorkflowError already has the issue — but the new error classes added here inherit it, so the leak surfaces on more classes post-merge.

If you'd rather not address it in this PR, fine — but a one-liner if (options?.cause !== undefined) this.cause = options.cause; in WorkflowError's constructor would clean up the inspect output for all subclasses.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9d45cdf — wrapped the assignment in if (options?.cause !== undefined). super(message, { cause }) already sets a non-enumerable .cause when provided, so the only thing my code needed to drop was the unconditional re-assignment that turned it into an enumerable own property. Cleans up util.inspect(err) output for every WorkflowError subclass. Changeset entry added at .changeset/workflow-error-cause-undefined.md.

'wrun_test123',
expect.any(String),
expect.objectContaining({ stepId: 'step_abc' })
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review: Note

The most user-visible behavior change in this PR — "fatal errors fail in 1 attempt, not 4" — has unit coverage on FatalError.is(ContextViolationError) === true and SerializationError.fatal === true, but no end-to-end test on the step handler itself asserting that a fatal user error emits exactly one step_failed event (no retries) and a non-fatal Error does retry to maxRetries.

If someone removes fatal = true from ContextViolationError (or moves the dehydrate call back outside the user-code try/catch) later, the unit tests stay green and the regression returns silently. A small integration-style test exercising the retry loop with a fatal vs. non-fatal user error would be a cheap regression gate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9d45cdf — added a step-handler fatal vs retryable behavior describe block in step-handler.test.ts that exercises the live retry-loop wiring. Three cases:

  1. error with fatal: true → exactly one step_failed, zero step_retrying
  2. non-fatal first-attempt Errorstep_retrying once, no step_failed
  3. non-fatal final attempt → step_failed once

Catches the case where fatal = true is removed from a context-violation class — the FatalError.is() unit tests would stay green but this test would flip from "1 step_failed" to "4 step_retrying then 1 step_failed."

message.includes(metadata.errorMessage as string)
) {
redundant.add('errorMessage');
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review: Nit

message.includes(metadata.errorMessage) is brittle: it works for the current call sites that pass ${framing}\n${stack} (which embeds the full message), but a future caller that passes a truncated errorMessage field and a full message would fail to dedupe and double-print. Not actionable today, just flagging if snapshot tests start drifting later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Agree it's a heuristic that depends on current call sites. Leaving as-is for now since all callers compose the message via ${framing}\n${stack} (and the stack always begins with the error name + message), but if a future caller passes a truncated errorMessage we'd need to switch to a more explicit "already-rendered fields" set.

},
"dependencies": {
"@workflow/utils": "workspace:*",
"chalk": "5.6.2",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review: Nit

The PR description says the change "no longer pulls chalk into every consumer" — accurate at the bundle level (the main entry doesn't import chalk, so tree-shaken consumers don't pay), but chalk is still in dependencies so it lands in node_modules for every install. Minor framing thing — if you wanted to avoid the install footprint for consumers that never touch ./ansi, this would have to move to optionalDependencies or peerDependencies. Probably not worth it; just calling out the gap between the description and the install reality.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the framing. Updating to "no longer pulls chalk into the bundle for consumers that don't import the ./ansi subpath." Moving chalk out of dependencies to peerDependencies / optionalDependencies is more disruptive (anyone importing ./ansi would have to install chalk explicitly), so leaving it as-is — the install footprint trade-off is fine for the SDK's audience.

pranaygp added 4 commits May 2, 2026 19:34
…lier-errors-followups

* origin-https/main:
  [vitest] [world-local] Fix local-world data recovery isolation (#1895)
  ci: switch Vercel deployment-protection bypass to OIDC Trusted Sources (#1882)
  Add additional tests for event consumer fixes for hook/sleep/step race conditions (#1528)
The job never runs `pnpm install` (it just calls `node` against a
checked-in script), so the pnpm store path never exists. The post-job
`actions/setup-node@v4` cache-save then fails with `Path Validation
Error: Path(s) specified in the action for caching do(es) not exist`
and red-X's the entire job even though the matrix step succeeded.

The setup-workflow-dev composite already has a `cache-pnpm` opt-out
input for this exact case — wire it through here.
- ContextViolationError: util.inspect(err) duplicated every framed detail
  line because the stack-tail strip only sliced the first message line.
  V8's Error.stack reads `Name: messageLine1\n  messageLine2\n  at ...`,
  so for our multi-line `title\n╰▶ docs: …` messages every detail line
  was getting prepended twice (once in the pretty form, once via the
  unsliced message tail). Count the actual message lines and slice past
  all of them. Repro test asserts `╰▶ docs:` appears exactly once.

- WorkflowError: stop assigning `cause: undefined` as an enumerable own
  property when no cause is provided. Subclasses (every error in this PR)
  inherit the parent constructor; the unconditional assignment polluted
  `util.inspect(err)` output with `{ cause: undefined, … }` on every
  no-cause instance. The `super(...)` call already conditionally sets
  `.cause` non-enumerably when `options.cause` is provided.

- step-handler.test.ts: add a regression-gate suite that exercises the
  fatal-vs-retryable retry-loop wiring directly. Asserts that an error
  with `fatal: true` produces exactly one `step_failed` event with no
  `step_retrying`, and that a non-fatal `Error` retries via
  `step_retrying` on early attempts and emits `step_failed` once the
  retry budget is exhausted. Catches the silent-regression case where
  `fatal = true` is removed from a context-violation error class but
  the `FatalError.is()` unit tests stay green.
Copy link
Copy Markdown
Member

@VaguelySerious VaguelySerious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough for me to ship and test on 5.x.x

Can we drastically shorten the changesets? I think this should be a single changeset on core/error packages that explains the user-facing changes mostly, not a file-by-file breakdown of what's changed. Also let's remove PR artifacts. Then good to ship.

Address review feedback to drastically shorten the changesets — fold
the 15 file-by-file entries into a single user-facing changeset for
@workflow/core / errors / builders / utils. Also drop the pr-artifacts/
folder (reviewer-only log captures, no longer needed).
@pranaygp
Copy link
Copy Markdown
Contributor Author

pranaygp commented May 3, 2026

Done in 351971a:

  • Consolidated all 15 changesets into a single friendlier-errors entry covering @workflow/core / errors / builders / utils. User-facing summary, no file-by-file breakdown.
  • Removed pr-artifacts/.

Ready to ship.

Comment thread .changeset/friendlier-errors.md Outdated
Comment thread .changeset/pretty-log-format.md Outdated
pranaygp and others added 8 commits May 4, 2026 10:47
Five user-driven fixes from manual smoke-testing of #1849:

1. Logger layout. composeLogLine() now puts the structured-fields block
   (attribution badge, run/step IDs, error code) **between** the framing
   line and the stack body, instead of after it where 30+ lines of stack
   buried the most useful information. The framing stays at the top,
   stack at the bottom, structured info readable at a glance.

2. Stack trim. Drops framework-internal frames (`node_modules/.pnpm/`,
   `node:internal/`, Turbopack-bundled `node_modules__pnpm_*` chunks,
   `_next_dist_*` chunks) and caps the surviving frame count at 6
   so the stack stays compact even on heavy async wrappers. Suppressed
   runs emit one summary line so users know the trim happened.

3. Wrapper-route noise. The nextjs-turbopack workbench's start route
   was catching `WorkflowRunFailedError` rejection on
   `Promise.race([readLoop(), run.returnValue])` and re-logging it via
   `console.error('Error in workflow stream:', error)` plus
   `controller.error(error)` — which then triggered Next.js's
   `⨯ failed to pipe response` overlay. The SDK already logs the
   failure cleanly upstream and the runId is on the response header, so
   the wrapper now closes the SSE stream cleanly on
   WorkflowRunFailedError.

4. Consistent framed `╰▶ hint:` / `╰▶ docs:` layout for all errors
   that carry a hint or docs slug. WorkflowError, SerializationError,
   and WorkflowBuildError now share one `appendFramedDetails` helper
   matching the box-drawing structure that ContextViolationError
   already used. Was: blank-line-separated `Learn more: <url>`. Now:
   one tree, indistinguishable from context-violation rendering.

5. Drop the duplicate logger-side `hint` field. Hints now live on the
   error message only — actionable hints get serialized into the event
   log, rehydrated on the workflow side, and shown in observability
   automatically. The previous logger-only hint duplicated stderr but
   never made it past the step boundary.

   Updated SerializationError hint to point at the foundations doc
   ("Ensure you're returning workflow serializable types. Check the
   serialization docs to see what's serializable:
   https://workflow-sdk.dev/docs/foundations/serialization") instead
   of the hardcoded `(plain objects, arrays, primitives, …)` list,
   which drifted out of sync as the supported types grew. Same hint
   reuses for step args, workflow args/return, stream messages, and
   any other site that goes through `formatSerializationError`.

Also retitled the retry summary `3 retries` → `3 max retries` since
"3 retries" next to "4 attempts" was ambiguous (already-happened vs.
budget).
- ErrorStackBlock (web observability): show just the first non-empty
  trimmed line of the error message in the card title with single-line
  truncation. Multi-line messages (`Failed to serialize step return
  value\n╰▶ hint: …`) were rendering the entire framed body in the
  title, pushing the copy button off-screen and burying the
  scannability of the headline. Full message stays in the body via
  the stack (V8 prepends `Name: message` to `Error.stack`), so no
  information is lost; hover-tooltip exposes the full title text.

- Persisted error message: drop the `Step "step//./.../foo"` machine
  name from `Step failed after N retries: …` and `Step exceeded max
  retries (…)` strings. Observability already attributes the event
  to a specific step via the UI tree, and the CLI logger emits the
  friendly `Step foo (./...) hit max retries` framing on its own
  line. Embedding the raw `step//./...` machine name in the persisted
  message text was duplicate noise.
Co-authored-by: Peter Wielander <mittgfu@gmail.com>
Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>
Co-authored-by: Peter Wielander <mittgfu@gmail.com>
Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>
…lier-errors-followups

* origin-https/main:
  [workbench] Add TanStack Start workbench and tests (#1875)
  Atomically dedupe duplicate step_created/wait_created events in world-local (#1877)
  Split tarball hosting out of docs into its own project (#1893)
  Replace fixed-sleep hook waits with event-driven waitForHook helper (#1879)

# Conflicts:
#	pnpm-lock.yaml
…wups' into pranaygp/friendlier-errors-followups

* origin/pranaygp/friendlier-errors-followups:
  Update .changeset/pretty-log-format.md
  Update .changeset/friendlier-errors.md
The class no longer attaches a slug-based `╰▶ docs:` line — the
foundations URL is embedded directly in the hint via the
`formatSerializationError` helper in @workflow/core. Update the test
expectations accordingly:

- bare-title case is now a single line (no docs link)
- hint case renders one `╰▶ hint: …` branch (no second branch)
Four `should throw error for an unsupported type` cases were still
asserting on the old hardcoded type list. Update to the new hint
phrasing that points at the foundations doc, matching the change in
`formatSerializationError` (`packages/core/src/serialization/errors.ts`).
Comment on lines +77 to +78
'`crypto.subtle.generateKey()` is not available inside a workflow function. Move key generation to a step function where full Node.js crypto is available.'
);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as a not implemented error. generateKey is a web crypto thing I think, not a nodejs function

const firstLine =
message.split('\n').find((line) => line.trim().length > 0) ?? message;
return firstLine.trim();
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if it's just 1 line, we should have a max character length or something so that we don't render very long error messages in the title, and just leave the body to have the full message

Copy link
Copy Markdown
Member

@TooTallNate TooTallNate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Withdrawing my prior REQUEST_CHANGES — both blockers are addressed and the related polish on top is welcome.

What's resolved

  • pr-artifacts/ removed (351971ac08). The 6 files plus README are gone, and the changesets were consolidated into a single friendlier-errors.md + pretty-log-format.md pair. ✓

  • Regression test for fatal-vs-retryable step routing added (9d45cdf6ec, step-handler.test.ts:779-887). Three new tests cover exactly the gap I flagged:

    • emits exactly one step_failed and does not re-queue when the step throws an error with fatal=true — uses a user-defined class FatalUserError extends Error { readonly fatal = true }. This is precisely the case where a third-party error with fatal: true should be treated as fatal — explicitly tested end-to-end through the handler's retry loop.
    • schedules a retry (and does not fail the step) on the first attempt of a non-fatal Error — positive case for the retry path.
    • emits step_failed once the non-fatal retry budget is exhausted — terminal case after maxRetries.

    All three pass on the PR branch when running the full file (820 core tests green).

  • util.inspect dedup fix in context-violation-error.ts (lines 113-127) — the original slice(1) only dropped the first line of .stack, so multi-line .message ended up duplicated when util.inspect() ran. Now slices past messageLineCount lines. There's a regression test in context-errors.test.ts:139-156 that asserts the framed ╰▶ docs: line appears exactly once — solid catch from review.

  • cause leak fix in WorkflowError (errors/src/index.ts:79-87) — this.cause = options?.cause was unconditionally setting the property, making it an enumerable own property even when undefined. The new conditional avoids polluting util.inspect(err) output with { cause: undefined, ... } on every no-cause subclass. Subtle but correct.

Reframing on FatalError.is() widening

Withdrawing my prior concern about the changeset bump type. Confirmed: this is beta-v5-only and not backported to stable, so per AGENTS.md the patch bump is fine — bump type is cosmetic on main in pre-release mode and only matters for backport-to-stable, which doesn't apply here.

The semantic widening makes more sense in context: with the upcoming custom-class serialization work in #1851 routing thrown values through the serialization pipeline, any class with fatal: true opts cleanly into the no-retry path without inheritance gymnastics. That's a clean architecture story and explicitly tested by the new regression case. Approving the design.

Non-blocking observations remaining

These were all flagged previously as non-blocking and have not been addressed. Not gating but worth tracking:

  • 6 stepLogger.* callsites still bypass the scoped stepRuntimeLogger (lines 374, 594, 651, 717, 774, 788 in current step-handler.ts). The stepLogger namespace is intentional for debug:workflow:step:* filtering, so the choice is defensible — but it means the IDs the new forRun factory was designed to centralize get manually re-passed in those 6 places. stepLogger.forRun(...).child({ stepId, stepName }) would close this.
  • Lowercase [workflow] prefix still in runtime/world.ts:90 and a few Next.js builder files. These are direct console.* calls (not logger callsites), so technically out of scope for the prefix-standardization work, but worth aligning eventually.
  • ANSI bytes can leak into stderr drains through formatLogMetadata's Ansi.* calls. The team has explicitly chosen this trade-off (comment in log-format.ts:24 notes "web/web-shared do not consume stderr at all — they read CBOR/JSON event payloads from the World event log"). Acceptable.
  • Test isolation nit in the new describe block. step-handler fatal vs retryable behavior has no beforeAll to initialize capturedHandlerRef; it relies on the sibling step-handler 409 handling block's beforeAll running first. Means the new tests fail when run in isolation (vitest -t "fatal vs retryable"). Easy to fix by duplicating the beforeAll, but only relevant to anyone reordering or pruning the file later.

Wrap-up

Big PR, well-executed end-to-end. The structured FramedContent/renderPlain/renderPretty split, the chalk subpath, the consolidated changeset, the regression tests added in response to review — all good craftsmanship. Ship.

…lier-errors-followups

* origin-https/main:
  Fix pnpm type issue after tanstack PR (#1907)
  ci: pass stale-banner via path: to sticky-pull-request-comment in tests + benchmarks workflows (#1887)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants